Querying Probabilistic Information Extraction

نویسندگان

Daisy Zhe Wang

Michael J. Franklin

Minos N. Garofalakis

Joseph M. Hellerstein

چکیده

Recently, there has been increasing interest in extending relational query processing to include data obtained from unstructured sources. A common approach is to use stand-alone Information Extraction (IE) techniques to identify and label entities within blocks of text; the resulting entities are then imported into a standard database and processed using relational queries. This two-part approach, however, suffers from two main drawbacks. First, IE is inherently probabilistic, but traditional query processing does not properly handle probabilistic data, resulting in reduced answer quality. Second, performance inefficiencies arise due to the separation of IE from query processing. In this paper, we address these two problems by building on an in-database implementation of a leading IE model— Conditional Random Fields using the Viterbi inference algorithm. We develop two different query approaches on top of this implementation. The first uses deterministic queries over maximumlikelihood extractions, with optimizations to push the relational operators into the Viterbi algorithm. The second extends the Viterbi algorithm to produce a set of possible extraction “worlds”, from which we compute top-k probabilistic query answers. We describe these approaches and explore the trade-offs of efficiency and effectiveness between them using two datasets.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Extracting and Querying Probabilistic Information in BayesStore

متن کامل

Knowledge Extraction and Joint Inference Using Tractable Markov Logic

The development of knowledge base creation systems has mainly focused on information extraction without considering how to effectively reason over their databases of facts. One reason for this is that the inference required to learn a probabilistic knowledge base from text at any realistic scale is intractable. In this paper, we propose formulating the joint problem of fact extraction and proba...

متن کامل

10 Years of Probabilistic Querying - What Next?

Over the past decade, the two research areas of probabilistic databases and probabilistic programming have intensively studied the problem of making structured probabilistic inference scalable, but—so far—both areas developed almost independently of one another. While probabilistic databases have focused on describing tractable query classes based on the structure of query plans and data lineag...

متن کامل

Developing a BIM-based Spatial Ontology for Semantic Querying of 3D Property Information

With the growing dominance of complex and multi-level urban structures, current cadastral systems, which are often developed based on 2D representations, are not capable of providing unambiguous spatial information about urban properties. Therefore, the concept of 3D cadastre is proposed to support 3D digital representation of land and properties and facilitate the communication of legal owners...

متن کامل

Probabilistic Graphical Models and their Role in Databases

Probabilistic graphical models provide a framework for compact representation and efficient reasoning about the joint probability distribution of several interdependent variables. This is a classical topic with roots in statistical physics. In recent years, spurred by several applications in unstructured data integration, sensor networks, image processing, bio-informatics, and code design, the ...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

PVLDB

دوره 3 شماره

صفحات -

تاریخ انتشار 2010

Querying Probabilistic Information Extraction

نویسندگان

چکیده

منابع مشابه

Extracting and Querying Probabilistic Information in BayesStore

Knowledge Extraction and Joint Inference Using Tractable Markov Logic

10 Years of Probabilistic Querying - What Next?

Developing a BIM-based Spatial Ontology for Semantic Querying of 3D Property Information

Probabilistic Graphical Models and their Role in Databases

عنوان ژورنال:

اشتراک گذاری